Edward Vytlacil
General Idea of Monte Carlo Analysis (for this course)
General Idea of Monte Carlo Analysis (for this course)
MC simulations used to:
MC simulations used to:
Shooter rolls two dice,
If sum of dice is 7 or 11, shooter wins,
If sum of dice is 2, 3, or 12, casino wins,
What is the probability that shooter wins?
In this example, could use probability theory to derive that the probability of winning is \(244/495 \approx 0.4929\).
See, e.g., here.
But might be easier to use MC simulation than to do the derivation.
Not feasible to derive solution in many examples.
Let’s start with simpler problem:
what is the probability that shooter wins on first roll?
Can derive that probability equals \(2/9 \approx 0.2222\)
Now we use Monte Carlo simulation to approximate that probability.
First step: simulating one roll of a die:
sample(1:6,1) which randomly drawing an element from {1,2,3,4,5,6}, each element equally likely, one time.Computers are deterministic.
To create truly random numbers, can use a physical process, e.g., from nuclear decay or radioisotopes.
Not necessary for our purposes, and not what R is doing…
R is generating pseudo-random numbers
R is generating pseudo-random numbers
You may obtain different pseudo-random numbers each time you run your code, neither you nor others can replicate your results.
For replication purposes, can set.seed
Now simulating rolling two dice:
sample(1:6,2,replace=TRUE) randomly draws one element from {1,2,3,4,5,6}, each element equally likely, with replacement, two times.Shooter wins on first roll if sum of dice is 7 or 11.
How to approximate probability winning first roll?
Simulate many rolls, find what fraction of rolls results in a sum of 7 or an 11.
[1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
True probability win on first roll: \(2/9 \approx 0.222\).
Fraction of 10 simulations where won: 0.1.
Additional problem:
Can get very different approximation if rerun code:
[1] 0.25
Recall Convergence in Probability
We say a sequence \(\bar{X}_n\) converges in probability to \(\mu\), written \(\bar{X}_n \stackrel{p}{\rightarrow} \mu\), if, for every \(\epsilon>0\), \[ \lim_{n \rightarrow \infty} \Pr[ | \bar{X}_n - \mu | \ge \epsilon] = 0.\]
If \(\bar{X}_n \stackrel{p}{\rightarrow} \mu\), we say that \(\bar{X}_n\) is a consistent estimator of \(\mu\).
Recall LLN for i.i.d. data.
When \(X_i\) an indicator variable, \(\mathbb{E}[X_i]=p\) with \(0 \le p \le 1\), so \(\mathbb{E}[X_i]=\mu\) always exists, finite. Thus, by LLN, \(\bar{X}_n \stackrel{p}{\rightarrow} p\), i.e., \(\bar{X}_n\) approximates \(p\) arbitrarily well for \(n\) sufficiently large.
What is the probability that shooter wins pass line bet in craps?
First write function to simulate pass line bet.
flowchart LR
D -->|   No   | F(   Roll <br> Again   )
subgraph Roll Again  
F(Roll Again  ) --> G[Roll 7?   ]
G -->|Yes   | H{   Lose   }
G -->|No   | I[Roll = <br> 1st roll?  ]
I -->|Yes   | J{   Win   }
I -->|No  | F
end
subgraph 1st Roll  
A(  Roll <br> Dice   ) --> B[  Roll <br>   7 or 11?   ]
B -->|   Yes   | C{   Win   }
B -->|   No   | D[  Roll 2 <br>   3 or 12?   ]
D -->|   Yes   | E{   Lose   }
end
f.craps <- function(){
# simulate first roll of dice
sum0 <- sum(sample(1:6,2,replace=TRUE))
# determine if win/lose on first roll
if (sum0 %in% c(7, 11)){
Win <- 1
} else if (sum0 %in% c(2, 3, 12)){
Win <- 0
} else{
# determine outcome if don't win/lose on first roll
while(TRUE){
# keep rolling until someone wins
}
}
return(Win)
}f.craps <- function(){
# simulate first roll of dice
sum0 <- sum(sample(1:6,2,replace=TRUE))
# determine if win/lose on first roll
if (sum0 %in% c(7, 11)){
Win <- 1
} else if (sum0 %in% c(2, 3, 12)){
Win <- 0
} else{
# determine outcome if don't win/lose on first roll
while(TRUE){
# keep rolling until someone wins
sum1 <- sum(sample(1:6,2,replace=TRUE))
if (sum1 == sum0){
Win <- 1
break
} else if (sum1 == 7){
Win <- 0
break
}
}
}
return(Win)
}Common distributions in R:
| Distribution | Function | Returns |
|---|---|---|
| Binomial | rbinom(n,size,prob) |
draw \(n\) times from \(\mbox{Binom}(\mbox{size},p).\) |
| Normal | rnorm(n,mean=0,sd=1) |
draw \(n\) times from \(N(\mbox{mean},\mbox{sd})\) |
| Student-t | rt(n,df) |
draw \(n\) times from \(t_{df}\) distribution. |
| Uniform | runif(n,min=0,max=1) |
draw \(n\) times from \(\mbox{Unif}[\mbox{min},\mbox{max}]\). |
From some function \(f\) ,
Let \(X\) denote random variable (or random vector).
Let \(X_1, . . . ,X_R\) denote \(R\) replications of \(X\).
Construct \(f(X_1), . . . ,f(X_R)\).
Let \(\overline{f(X)}_R\) sample mean of \(f(X_j)\) across \(R\) replications.
Use \(\overline{f(X)}_R\) to approximate \(\mathbb{E}[f(X)]\).
Often sampling \(X\) where \(X\) itself is determined by underlying sample.
For example, consider simulating mean of \(N\) coin flips, i.e., \(X=\bar{Y}_N\) were \(Y_N \sim \mbox{Bernoulli}(1/2)\).
creating \(R\) replication samples, each replication sample of \(N\) coin flips.
Let \(X_j = \bar{Y}_{j,N}\), sample mean of \(N\) coin flips on replication sample \(j\).
proceed as before.
Note two dimensions, \(N\) (size of each sample) and \(R\) (number of replications).
Let \(\bar{Y}_{10}\) be the mean of \(10\) fair coin flips. Using \(100,000\) replications,
Let \(\bar{Y}_{100}\) be the mean of \(100\) fair coin flips. Using \(100,000\) replications,
Let \(\bar{Y}_{1000}\) be the mean of \(1,000\) fair coin flips. Using \(100,000\) replications,
Let \(\bar{Y}_{10000}\) be the mean of \(10,000\) fair coin flips. Using \(100,000\) replications,
Let \(\bar{Y}_{10}\) be the mean of \(10\) fair coin flips. Using \(100,000\) replications,
Let \(\bar{Y}_{100}\) be the mean of \(100\) fair coin flips. Using \(100,000\) replications,
Let \(\bar{Y}_{1000}\) be the mean of \(1000\) fair coin flips. Using \(100,000\) replications,
Let \(\bar{Y}_{10}\) be the mean of \(10\) fair coin flips. Using \(100,000\) replications,
Let \(\bar{Y}_{100}\) be the mean of \(100\) fair coin flips. Using \(100,000\) replications,
Let \(\bar{Y}_{1000}\) be the mean of \(1000\) fair coin flips. Using \(100,000\) replications,
Switching to blackboard. . .
Econ 123: Intermediate Econometrics and Data Analysis